Skip to content

Conversation

smcguire-cmu
Copy link
Contributor

@smcguire-cmu smcguire-cmu commented Sep 17, 2025

Updates suffix behavior in crossmatch and join method to take a kwarg suffix_method to determine the suffix behavior. The two options currently are all_columns and overlapping_columns. all_columns applies suffixes to all columns in the catalogs, while overlapping_columns only renames the conflicting names and logs a warning with these renames.

The default currently is all_columns to match our current behavior, but there is a future warning if the user doesn't explicitly set the suffix method that the default will change in the future.

Closes #951 and closes #911 and closes #74

Copy link

github-actions bot commented Sep 17, 2025

Before [969705a] After [c5b65b2] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_open_many_columns_all
1.68±0.06s 1.76±0.01s 1.05 benchmarks.time_open_many_columns_default
5.82±0s 5.92±0.01s 1.02 benchmarks.time_create_large_catalog
92.0±0.8ms 92.5±0.9ms 1.01 benchmarks.time_kdtree_crossmatch
25.0±0.06ms 24.9±0.1ms 1.00 benchmarks.time_box_filter_on_partition
581±6ms 582±2ms 1.00 benchmarks.time_open_many_columns_list
891±6ms 886±3ms 0.99 benchmarks.time_create_midsize_catalog
41.6±1ms 40.0±0.9ms 0.96 benchmarks.time_polygon_search

Some benchmarks failed or their performance regressed significantly.

Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.16%. Comparing base (3aa9fb1) to head (9420c0f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1021      +/-   ##
==========================================
- Coverage   97.19%   97.16%   -0.04%     
==========================================
  Files          46       46              
  Lines        2638     2679      +41     
==========================================
+ Hits         2564     2603      +39     
- Misses         74       76       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@smcguire-cmu
Copy link
Contributor Author

I updated the code with your comments. While I was adding more tests I realized the metadata ra and dec columns weren't being updated properly, so I fixed that. The metadata also had some logic about carrying over default columns, but I remember users not liking that (If they load a catalog, xmatch, save, then load and half the columns don't load). So I just got rid of the default columns logic, but I can put it back in if you want.

Copy link
Contributor

@delucchi-cmu delucchi-cmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the straightforward readable approach for now.

Copy link

github-actions bot commented Oct 2, 2025

Before [969705a] After [fb7d2ac] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
1.70±0.06s 1.76±0.01s 1.04 benchmarks.time_open_many_columns_default
91.3±0.8ms 93.9±1ms 1.03 benchmarks.time_kdtree_crossmatch
39.8±0.3ms 40.5±0.5ms 1.02 benchmarks.time_polygon_search
5.79±0.01s 5.88±0s 1.01 benchmarks.time_create_large_catalog
879±6ms 875±4ms 1.00 benchmarks.time_create_midsize_catalog
581±7ms 579±3ms 1.00 benchmarks.time_open_many_columns_list
25.3±0.6ms 25.2±0.5ms 0.99 benchmarks.time_box_filter_on_partition

Some benchmarks failed or their performance regressed significantly.

Copy link

github-actions bot commented Oct 3, 2025

Before [5fe1775] After [f24dc14] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
25.9±0.3ms 26.6±0.3ms 1.03 benchmarks.time_box_filter_on_partition
94.1±1ms 94.6±1ms 1.01 benchmarks.time_kdtree_crossmatch
1.71±0.05s 1.72±0.06s 1.01 benchmarks.time_open_many_columns_default
41.7±0.9ms 42.1±1ms 1.01 benchmarks.time_polygon_search
5.85±0.03s 5.79±0.03s 0.99 benchmarks.time_create_large_catalog
601±8ms 593±7ms 0.99 benchmarks.time_open_many_columns_list
885±5ms 866±4ms 0.98 benchmarks.time_create_midsize_catalog

Some benchmarks failed or their performance regressed significantly.

Copy link

github-actions bot commented Oct 7, 2025

Before [5fe1775] After [20d917c] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
1.72±0.06s 1.75±0.01s 1.02 benchmarks.time_open_many_columns_default
5.81±0.01s 5.80±0.03s 1.00 benchmarks.time_create_large_catalog
587±2ms 586±2ms 1.00 benchmarks.time_open_many_columns_list
873±10ms 863±5ms 0.99 benchmarks.time_create_midsize_catalog
92.3±0.7ms 91.0±2ms 0.99 benchmarks.time_kdtree_crossmatch
25.5±0.3ms 24.9±0.1ms 0.97 benchmarks.time_box_filter_on_partition
41.1±0.6ms 40.0±0.6ms 0.97 benchmarks.time_polygon_search

Some benchmarks failed or their performance regressed significantly.

# Conflicts:
#	src/lsdb/dask/join_catalog_data.py
Copy link

github-actions bot commented Oct 8, 2025

Before [3aa9fb1] After [6016566] Ratio Benchmark (Parameter)
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_all_suffixes
failed failed n/a benchmarks.time_lazy_crossmatch_many_columns_overlapping_suffixes
failed failed n/a benchmarks.time_open_many_columns_all
36.7±0.7ms 37.7±1ms 1.03 benchmarks.time_polygon_search
23.2±0.3ms 23.5±0.3ms 1.01 benchmarks.time_box_filter_on_partition
563±2ms 566±2ms 1.01 benchmarks.time_open_many_columns_list
82.2±1ms 82.3±1ms 1.00 benchmarks.time_kdtree_crossmatch
1.53±0.01s 1.51±0.02s 0.99 benchmarks.time_open_many_columns_default
5.52±0.02s 5.43±0.02s 0.98 benchmarks.time_create_large_catalog
830±8ms 811±6ms 0.98 benchmarks.time_create_midsize_catalog

Some benchmarks failed or their performance regressed significantly.

@smcguire-cmu smcguire-cmu merged commit 5a1a431 into main Oct 8, 2025
13 checks passed
@smcguire-cmu smcguire-cmu deleted the sean/suffixes branch October 8, 2025 20:14
@camposandro camposandro mentioned this pull request Oct 10, 2025
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants